Skip to main content

What is Private Inference API

info

COMING SOON: Bring your own model - GPU required

Understanding Private Inference API​

Private Inferencing API is a secure inferencing service run on Nebul’s private NeoCloud, ensuring compliance and data protection. It offers open-source and fine-tuned AI models, ideal for industries handling sensitive information, with seamless integration and transparent pricing.

How Inference API Works​

In a MaaS model, Nebul host pre-trained models on their private NeoCloud infrastructure. Clients can access these models through standardized APIs, allowing them to send data and receive predictions or analyses in real-time. This setup eliminates the need for businesses to invest in expensive hardware or dedicate resources to model training and maintenance, offering a scalable and cost-effective solution for deploying AI functionalities.

Private Inference API Offering​

As a private NeoCloud provider, Nebul is well-positioned to offer MaaS solutions that combine the flexibility of open-source models with the customization required for industry-specific applications. Our Private AI MaaS offering includes:

  • Access to Open-Source Large Language Models (LLMs): We support a variety of open-source LLMs, such as Llama 3.1, Nemotron-4, DeepSeek R1, and many more to come. These models have demonstrated versatility and performance across various tasks, including natural language processing, coding assistance, and data analysis.
  • Model Fine-Tuning and Customization: Beyond providing access to pre-trained models, Nebul offers services to fine-tune these models on industry-specific data. This customization ensures that the AI solutions align closely with the unique requirements and nuances of your business domain, enhancing relevance and performance.

Key Benefits​

  • Scalability and Flexibility: Private AI MaaS allows businesses to scale AI usage up or down based on demand, ensuring optimal resource utilization and cost management.
  • Enhanced Security: Operating within our Neocloud's private infrastructure ensures that your data and AI models are protected by robust security measures, aligning with industry compliance standards.
  • Rapid Deployment: With access to pre-trained and customizable models, businesses can quickly integrate AI functionalities into their applications, accelerating time-to-market.

Getting Started with Private Inference API​

To leverage the offerings:

  1. Consultation: Engage with our team to assess your AI needs and identify suitable models and customization options.
  2. Integration: Seamlessly integrate selected models into your applications through our user-friendly APIs.
  3. Optimization: Benefit from ongoing support and optimization services to ensure that the AI solutions evolve with your business needs.

Difference: Pricing models 'Tokens VS GPU'​

  • Tokenized model*: Pay only what you need (read/write). Variable fee.
  • GPU: get dedicated GPU(s) that run your model(s). Fixed fee.

(* note: Tokenized model is not yet available)